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In this paper, we consider a minimax approach to managing an inventory under distributional uncertainty. 
In particular, we study the associated multistage distributionally robust optimization problem, when only 
the mean, variance, and distribution support are known for the demand at each stage. It is known that if 
the policy maker is allowed to recompute her policy choice after each stage (i.e. dynamic formulation), thus 
taking prior realizations of demand into consideration when performing the relevant minimax calculations 
at later stages, a basestock policy is optimal. In contrast, if the policy maker is not allowed to recompute her 
policy after each stage (i.e. static formulation), far less is known. If these two formulations have a common 
optimal policy, i.e. the policy maker would be content with the given policy whether or not she has the power 
to recompute after each stage, we say that the policy is time consistent, and the problem is weakly time 



C""~~- ' consistent. If every optimal policy for the static formulation is time consistent, we say that the problem is 

o ' 

-^ ' strongly time consistent. In this paper, we give sufficient conditions for weak and strong time consistency. We 

^b ' also provide several examples demonstrating that the problem is not time consistent in general. Furthermore, 

^^ ' these examples show that the question of time consistency can be quite subtle in this setting. In particular, 

we show that: (i) the problem can fail to be weakly time consistent, (ii) the problem can be weakly but not 
strongly time consistent, and (iii) the problem can be strongly time consistent even if the two formulations 
have different optimal values. Interestingly, this stands in contrast to the analogous setting in which only 



?H ■ the mean and support of the de mand di s tribut ion is known at each stage, for which it is known that such 



time inconsistency cannot occur 



Shapiro! (|2012f ) 
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1. Introduction 

The news vendor (news boy) problem, used to analyze th e trade -offs associated with stocking an 



inventory, has its origin in a seminal paper by lEdgeworthl (II888I ). In its classical formulation, the 



* Research of this author was partly supported by the NSF award CMMI 1232623. 



Xin, Goldberg, and Shapiro: Distrihutionally robust multistage inventory models with moment constraints 

Article submitted to ; manuscript no. 



problem is stated as a minimization of the expected value of the relevant ordering, backorder, and 
holding costs. Such a formulation requires a complete specification of the probability distribution of 
the underlying demand process. However, in applications knowledge of the exact distribution of the 
demand process is rarely available. This motivates the study of minimax type (i.e. distributionally 
robust) formulations, where minimization is pe rformed w i th respect t o a worst- c ase di s tribution 



Zackova 



from some family of potential distribution s (cf. iDupacoval (|l987l . l200ll ) , iPrekopal (|l995l ) 
(119661 )). In a pioneering paper IScara (Il958[ ) gave an elegant solution for the minimax news vendor 
problem when only the first and second moments of the demand di stribution are known. His work 
has led to considerable follow-up work, and we refer the reader to iGallego and MoonI (jl993l ) and 
Gallego. Katircioglu and RamachandranI (J2007I ). and references therein, for related results. For a 
more general overview of risk analys is for news vendor and inventory models we can refer, e.g., to 
Ahmed. Cakmak and Shapiro! (|2007l ) and IChoi. Ruszczvhski and Zhad (|201l[ ) . 

In practice an inventory must often be managed over some time horizon, and the classical 
news vendor problem was na turally extend ed to the multistage setting, for which there is also 
a considerable literature (see IZipkinI (l200d ) and references therein). Recently, robust variants of 
such multistage problems have begun to receive attention in the literature. It has been observed 



that such multistage ro 
time i n consistency (e . g., 



3ust optimiza. t ion p r oblems can exhib i t a subtle phenome i ion known as 



(12004) 



Ruszczvhskil ( 



20ld ). IShapirol ( 



Artzner et al.l (|2007h . lCarpentier et all (J2012f ). lHuang et al.l (|201l[ ) 
20091 )). 



Riedel 



Various formal definitions of time consistency have been proposed in the literature. We will give 
a formal definition, which is naturally suited to our framework, in Section [H The intuition is as 
follows. A multistage distributionally robust optimization problem can be viewed in two ways. In 
one formulation, the policy maker is allowed to recompute her policy choice after each stage (i.e. 
dynamic formulation), thus taking prior realizations of demand into consideration when performing 
the relevant minimax calculations at later stages, in which case it is known that a basestock policy 
is optimal. In the second formulation, the policy maker is not allowed to recompute her policy after 
each stage (i.e. static formulation), in which case far less is known. If these two formulations have 
a common optimal policy, i.e. the policy maker would be content with the given policy whether 
or not she has the power to recompute after each stage, we say that the policy is time consistent, 
and the problem is weakly time consistent. If every optimal policy for the static formulation is time 
consistent, we say that the problem is strongly time consistent. Such a property is desirable from 
a policy perspective, as it ensures that previously agreed upon policy decisions still make sense 
when the policy is actually irnplein ented, possibly at a later time. This approach is in line with 
the original work of iBellman I (J1957I ) on dynamical programming. In the context of distributionally 



robust formulations for the news vendor problem, it was recently shown that if all one knows about 
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the demand at each stage is the support a nd first moment, then time inconsistency cannot occur, 



in an appropriate sense (cf. , IShapiro 



2012) 



In this paper, we extend the work of 



Scarf 



_ iowever, beyond these results, very little is known. 
[|l958l ) by considering the multistage inventory (news 
vendor) problem when the support and first two moments are known for the demand at each stage. 
We give sufficient conditions for weak and strong time consistency. We also provide several examples 
demonstrating that the problem is not time consistent in general. Furthermore, these examples 
show that the question of time consistency can be quite subtle in this setting. In particular, we 
show that: (i) the problem can fail to be weakly time consistent, (ii) the problem can be weakly but 
not strongly time consistent, and (iii) the problem can be strongly time consistent even if the two 
formulations have different optimal values. Interestingly, this stands in contrast to the analogous 
setting in which only the mean and support of the demand di stributio i i is kn own at each stage, for 
which it is known that such time inconsistency cannot occur IShapird (|2012l ). 

The structure of the rest of the paper is as follows. In Se ction [21 we re view the setup and known 
results for the single stage setting, including the results of IScarj (jl958f ). and provide new results, 
which extend Scarf's analysis of the single stage distributionally robust news vendor problem to a 
new family of objective functions. In Section[3l we formally introduce the multistage distributionally 
robust news vendor problem, and review the static and dynamic formulations. In Section SJ which 
comprises the main results of the paper, we explore the notion of time consistency for the multistage 
distributionally robust news vendor problem. In Subsection 14.11 we give sufficient conditions for 
weak and strong time consistency. In Subsection 14.21 we give several examples, which demonstrate 
that the problem of interest is not time consistent in general, and further show that the question 
of time consistency can be quite subtle. We provide concluding remarks and directions for future 
research in Section [5j Also, we include a technical appendix in Section [6l 

2. Single stage formulation 

In this section we review both t he clas s ical a nd distributionally robust single stage formulation, 
including some relevant results of lScarj (jl958[ ). We also extend Scarf's approach to provide a new 
explicit solution for a particular class of objective functions, which we will later use in our study 
of time consistency. 

2.1. Classical formulation 

Consider the following classical formulation of the news vendor problem: 



minE[^(x,L') 

x>0 



where 



^{x,d) := cx + b[d — x]^ + h[x — d] + . 



(2) 
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and c,b,h are the ordering, backorder penalty, and holding costs, per unit, respectively. Unless 
stated otherwise we assume that 6 > c > and h>0. The expectation is taken with respect to 
the probability distribution of the demand D, which is modeled as a random variable (r.v.) having 
nonnegative support. It is well known that this problem has the closed form solution x = F^^ 1 1^ J , 
where F{-) is the cumulative distribution function (cdf) of the demand D, and F^^ is its inverse. Of 
course, it is assumed here that the probability distribution, i.e. the cdf F, is completely specified. 

2.2. Distrihutionally robust formulation 

Suppose now that the probability distribution of the demand D is not fully specified, but instead 
assumed to be a member of a family of distributions 9Jt. Then we instead consider the following 
distributionally robust formulation: 

uimtp{x), (3) 

where 

V'(x):=supEq[*(x,Z?)], (4) 

and the notation Eg emphasizes that the expectation is taken with respect to the distribution Q 
of the demand D. 

We now introduce some additional notations to describe certain families of distributions. For a 
probability measure (distribution) Q, we let supp(Q) denote the support of the measure, i.e. the 
smallest closed set A C M such that Q{A) = 1. With a slight abuse of notation, for a r.v. Z, we 
also let supp(Z) denote the support of the associated probability measure. For a given closed (and 
possibly unbounded) interval X C M, we let *P(X) denote the set of probability distributions Q such 
that supp(Q) C I. Although we will be primarily interested in the setting that X C R_,_ (i.e. demand 
is nonnegative) , it will sometimes be convenient for us to consider more general families of demand 
distributions. By Sa we denote the probability measure of mass one at a G M. 

In this paper, we will study families of distributions satisfying moment constraints of the form 

mi:={Q€^{I):EQ[D]=i^,EQ[D^]=fi^ + a']. (5) 

Unless stated otherwise, it will be assumed that 9Jt is indeed of the form ([5]), and is nonempty. We 
let a denote the left-endpoint of X (or — oo if X is unbounded from below), and let /3 denote the 
right-endpoint of X (or oo if X is unbounded from above); i.e., X= [a,/3]. It may be easily verified 
that, in case of bounded interval [a,/3], the set 971 is nonempty iff the following conditions hold 

/i G [a, /3] and o"^ < (/3 — /i) (/U — a) , (6) 

which will be assumed throughout. Furthermore, one can also identify conditions under which 9Jt 
is a singleton. 
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Observation 1 // — oo < a < f3 < oo, fj, G [a,/3], and a"^ = {/3 — /i)(/i — a), then 9Jt consists of the 
single probability measure which assigns to the point a probability p= fr^, and to the point (3 
probability l—p= |5f • 

Our definitions imply that for all x G M, ip{x) equals the optimal value of the following maxi- 
mization problem: 

max f'i/(x,T)dQ(T) 

S.t. fTdO(T) = a.jT^d O(T)=IJ^+CT^ 



Problem ([7D is a classical problem of moments (see, e.g.. lLandaulll987l) . By the Richter-Rogosinski 



Theorem (e.g.. lShapiro. Dentcheva and Ruszczvhskill2009l Proposition 6.40) we have the following. 



J 



Observation 2 // Problem d?]) possesses an optimal solution, then it has an optimal solution with 
support on at most three points. 



We note that the distributionally robust single stage news vendor problem considered by 



ScarJ 



()1958l ) is exactly Problem ^ when the objective function ^{x,d) is of the form ([2]), and 1 = M^. 
As it will be useful for later proofs, we briefly review Scarf's explicit solution. We actually state a 
slight generalization of the results of Scarf, and for completeness we include a proof in Section [6l 

Theorem 1 Suppose that b> c, c + h>0, /U > 0, a > 0, and X = R+. Let k := ^^^^^^- Then for 
each x € M, 



ipix) 



b^ — {b — c)x, otherwise. 



As a consequence, a complete solution to the problem miUj.gnV'C^^) ^-^ ^^ follows. 



2 



(i) If ^ > -j^, then the unique optimal solution is x = 0, and the optimal value is fib. 

(ii) If ^ < l^, then the unique optimal solution is x = fi + K(t(1 — k;^)^^, and the optimal value 

is cfj,+ [{h + c){b- c))^(T. 

(iii) If ^ = j^, then all x £ [0, /i + Ka{l — k^)^ 2 ] are optimal, and the optimal value is fib. 

Furthermore, we note that in all cases, for all x £M, argmaxQggr,jEQ[^(2;,L')] is non-empty. Also, 

the optimal solution set and value of the problem minj,gR-i/'(x) is identical to that of Problem ([3|), 

i.e. optimizing over x G M, as opposed to x £ IR+, makes no difference. 

For use in later proofs, it will also be useful to demonstrate a particular variant of Theorem [TJ 
Suppose that in Problem ([3]), we are not forced to select a deterministic constant x, but can instead 
select any distribution Di for x. Specifically, let us consider the following minimax problem: 

^mm^^0(QO, (9) 
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where 






and the notation Eg^xQa indicates that for any choices for the marginal distributions Qi, Q2 of Di 
and D2, the expectation is taken w.r.t. the associated product measure, under which Di and D2 
are independent. In this case, we have the foUowing result. 

Proposition 2.1 Suppose that b> c, c + h>0, fJ.>0, a>0, ^ > j^, andZ = M.. Then Problem 

([9]) has the unique optimal solution Qi = So- 
Proof Since problem ([9|) has no constraints, it suffices to optimize over measures of mass one at 

a point 2; G M. Of course, for Qi = 5^ we have that 4){Qi) = ^(x). Therefore Qi = 5o is an optimal 

solution of dHl by Theorem [TJ For a proof of uniqueness we refer to the Appendix (Section [6]) . D 
We also note that il) inherits the property of convexity from ^ . 

Observation 3 // '!'(•, d) is a convex function for every fixed d£l, then tJ) is also a convex 
function, and Problem ^ is a convex program. 

As several of our later proofs will be based on duality theory, we now briefly review duality for 
Problem ([7]). 

2.3. Duality for Problem ^ 



The dual of Problem ([7]) can be constructed as follows (cf.. llsiilll963l ). Consider the Lagrangian 



L(Q,A):= /'[^(x,r)-^A,r']dQ(r) + Ao + Ai/i + A2(M' + a2). 

By maximizing L(Q, A) with respect to Q G '^[X), and then minimizing with respect to A, we obtain 
the following Lagrangian dual for Problem ([7|): 

min Ao + Ai^ + A2(/i^ + cr^) 



(10) 
s.t. Ao + AiT + A2T2>'I'(x,r), reX. 

We denote by val(P) and \al{D) the respective optimal values of the primal Problem ([7]) and its 
dual Problem pOh . By convention, if Problem ([7]) is infeasible, we set val(P) = —00, and if Problem 
(II op is infeasible, we set val(Z?) = 00. We denote by Solp(x) the set of optimal solutions of the 
primal problem, and by Sol£)(x) the set of optimal solutions of the dual problem, and note that 
these sets may be empty, even when both programs are feasible, e.g. if the respective optimal values 
are approached but not attained. 

Note that whenever P is feasible (which we assume throughout), val(P) = ip{x), and val{D) > 
val(P). We now give sufficient conditions for there to be no duality gap, i.e. val(P) =val(D), as 
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well as conditions for Problems ([7]) and (jlOp to have optiin al solutions. By specifying known general 
resul ts for duality of such progra ms, e.g., (jShapirdl200ll , Corollary 3.1, Propositions 3.4 and 3.5) 



and (JBonnans and Shapiro 



200d . Theorem 5.97), to the considered setting, we have the following. 



Proposition 2.2 If Q is a probability measure which is feasible for the primal Problem, ([7]), A = 
(Aq, Ai, A2) is a vector which is feasible for the dual Problem, (jlO|) . and 

supp(Q) C {r G X : ^{x, r) = Aq + XiT + Aar^}, (11) 

then Q is an optimal primal solution, A is an optimal dual solution, and val(P) =val{D). Con- 
versely, i/ val(P) = val(D), and Q and A are optimal solutions of the respective primal and dual 
problems, then Condition (|lip holds. 



2.4. Explicit solution of Problem ([7]) for a class of convex, continuous, piecewise 
afRne functions 



Scarj (jl958l ) gave an explicit solution for Problems ([7]) and (jlOp when X = R_|_, and ^(x,-) is a 
convex, continuous piecewise afiine function with exactly two pieces, by explicitly constructing 
a feasible primal - dual solution pair satisfying the conditions of Proposition 12.21 (details of this 
construction can be found in Section [6|). Here, we generalize Scarf's approach to a class of convex, 
continuous, piecewise affine (CCPA) functions with three pieces, as we will need the solution to 
such problems for our later studies of time consistency. We note that Scarf's approach can also be 
extended to the family of general CCPA functions with a general number of pieces, although we 
do not pursue that here. In particular, we establish the following result, and defer the proof to the 
Appendix (Section [6]). 

Theorem 2 Suppose that for some x 6 R, there exist Ci, C2 > s.t. ci < C2, and ^{x, d) = max{— ci+ 
Ci,0,d— €2} for all d G M. Let rj := i(ci + C2), and f{z) := ((z — /i)^ + cj^) ^ for all z G R. Further 
suppose that a >0, 1 = R+, 

- (2/i - 3ci + C2) (3c2 - Ci - 2/i) < a^ , 

and r] — f(7^) > 0. Then the unique optimal solution to the primal Problem ([7]) is the probability 
measure Q having support at two points rj — f{i]) and ?y + f(ry), with 

Also, the unique optimal solution to the dual Problem (jlOp is 

Ao = i(r?^ + (ry-/x)^ + a^)r(r?) + ^^, X, = -^r\v), >^2 = hr\v)- (13) 
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3. Multistage formulation 

In this section, we study a multistage extension of the distributionahy robust news vendor problem 
discussed in Section [2^21 

3.1. Classical formulation 

We begin by reviewing the classical (i.e. non-robust) multistage news vendor problem, and start 
by introducing some additional notations. For a vector z G M", and k G {1, ...,n}, let Z[k] denote the 
vector consisting of the first k components of z, i.e. Z[k] := (zi, . . . , z^), and Z[o] := 0. 

We suppose that there is a finite time horizon T, and a (random) vector of demands D = 
{Di, . . . ,Dt), such that {Dt, t = l,...,T} are mutually independent. We now define the family 
of admissible policies 11 by introducing two families of functions, y = {yt, t = 1, . . . ,T} and x = 
{xt, t = 1, . . . ,T}. Conceptually, yt will correspond to the inventory level at the start of stage t, 
and Xt{yt) will correspond to the inventory level after having ordered in stage t, but before the 
demand in that stage is realized. 

Note that for t > 2, y^ is a function of D^t-i], and we restrict ourselves to policies such that 
the amount ordered in stage t depends on realizations of past demand only through the inventory 
level at the start of stage t, i.e. the amount ordered in stage t is a function of yt rather than 
the whole history D^t~i\ of the demand process (justification for this assumption will be given 
below). Such policies are nonanticipative, i.e. decisions do not depend on realizations of future 
demand. We assume that yi, the initial inventory level, is a given constant. We also require that 
one can only order a nonnegative amount of inventory at each stage. It follows that the set of 
admissible policies 11 should consist of those vectors of one-dimensional (measurable) functions 
vr = (xi(yi),X2(y2), • • ■ iXTiyr)) such that x^ : M— ;• M satisfies Xt{y) > y, for all y G M and t = 1, ...,T. 
To be consistent with the inventory dynamics, we also require that 

yt+i = Xt{yt)-Dt,t = l,...,T-l. (14) 

It follows that any given choice of tt G 11, along with the given yi, completely determines the 
associated functions yi, . . . , yx- Combining the above, we can write the classical multistage inventory 
(news vendor) problem as follows: 

minE|^p*-i[c,(x,(y,)-2/,)+*t(^t(yt),A)] i. (15) 

Here p G (0,1] is a discount factor, Ct,bt,ht are the ordering, backorder penalty and holding costs 
per unit in stage t, respectively, and 

*t(xt,dt) := bt[dt - xt]+ + ht[xt - dt]+. (16) 
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Unless stated otherwise we assume that 6^ > Ct > and /it > for all t = l, ...,T. 



Wit h Problem ()15p are associated the following dynamic programming equations (e.g., IZipkin 



mm) 



Vt{y) = ^^^{ctix-y)+E[^t{x,Dt)+pVt+,{x-Dt)]}, (17) 

x>y 

t = l, ..., T, with Vt+i{-) = 0. The dynamic Equations (jl7|) naturally define a set of optimal policies 
through the relation 

Xt{y)eXt{y):=avgmm{ct{x-y)+E[^t{x,Dt) + pVt+i{x-Dt)]},t = l,...,T, (18) 

x>y 

with associated optimal value given by Vi{yi). Note that Xt{y) are (measurable) functions of y, 
t = l,...,T. 

It is well-known that the two formulations (I15p and (llTp are equivalent in the following sense. 
The optimal values of these problems are equal to each other, i.e. both have value Vi{yi), and 
the respective sets of optimal policies are the same for both formulations. More precisely, for any 
optimal policy vf = (rci, . . . ,xt) for Problem p!5|) . and any t £ [1,T], there exists a set A G M s.t. 
P{yt £ A) = 1, and Xt{y) G ^t(y) for all y G ^4. As we shall see, this equivalence does not necessarily 
hold for distributionally robust multistage inventory problems with moment constraints. 

Note that it follows from (jlSp that the optimal policies for Problem (|15p indeed have the property 
that the amount ordered in stage t is a function of yt, and thus there is no loss of generality in con- 
sidering only policies of that form. Of course, the assumption of mutual (stage-wise) independence 
is essential for this conclusion. 

3.2. Distributionally robust formulations 

Suppose now that the distribution of the demand process is not known, and we only have at our 
disposal information about the support and first and second moments. In this case, it is natural 
to use the framework previously developed for the single stage problem (see Section [2]) to handle 
the distributional uncertainty at each stage. However, in the multistage setting, there is a nontriv- 
ial question of how to model the associated uncertainty in the joint distribution of demand. The 
approach we tak e here is simila r to that taken in previous studies of time consistency of the inven- 



tory model (cf.. IShapirdl2012f ). We will consider two formulations, one intuitively corresponding 
to the modeling choices of a policy maker who does not recompute her policy choices after each 
stage (referred to as the static formulation), and one corresponding to a policy-maker who does 
(referred to as the dynamic formulation). We suppose that we have been given a sequence of closed 
(possibly unbounded) intervals X^ = [at,/3t] C M, t = l,...,T, and sequences of the corresponding 
means pt, t = l, . . . ,T, and variances a^, t = l, . . . ,T. 
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3.2.1. Static formulation We first consider the following formulation, referred to as static^ 
in which the policy maker does not recompute her policy choices after each stage. Let us define 

m,:={Qte^{It):EQ^[Dt]=fiuKQAD^t]=l^' + ^t},t = l,...,T, (19) 

an := {Q = Qi X • • • X Qt : Qt G 9Jtt, t = 1, ..., T}, (20) 

i.e. 9Jt is the set of all product measures whose marginals belong to the associated sets OJtt, t = 
1, . . . ,T. In case of bounded intervals [a(,/3(], in order for the sets 9Jtt to be nonempty we assume 
that (compare with Q) 

lit G K,/3t] and erf < {f3t- fJ-t){t^t- Ot), t = l,...,T. (21) 

The associated minimax problem supposes that although the set of associated marginal distribu- 
tions may be "worst-case" , the joint distribution will always be a product measure (i.e. the demand 
will be independent across stages). We conclude that the static formulation for the distributionally 
robust multistage news vendor problem may be formulated as follows: 

Tren QeOT 



mmm&^EQ\^p'-'[ct{xt{yt)-yt)+^t{xt{yt),Dt)] \, (22) 



k t=i ) 

where 11 is the set of admissible policies defined previously in Section 13.11 Of course, if the set 
9Jt = {Q} is a singleton, then formulation (I22p coincides with formulation (llSp taken with respect 
to the distribution Q = Qi x • • • x Qq- of the demand vector (Di, ...,Dt)- 

We note that very little is known about the set of optimal policies for problem (|22p . as this 
problem does not enjoy a dynamic-programming formulation along the lines of (I17p . Note also that 
as in Section [3. H we only consider policies for which the amount ordered in stage t is a function 
of yt, and will discuss this choice in greater detail in the following section. 

3.2.2. Dynamic formulation We now consider the so-called dynamic formulation, in which 
the policy maker recomputes her policy choices after each stage. Let us think on what it means for 
the policy maker to "recompute" her optimal policy at the start of the final stage T. As she cannot 
change her past decisions, the only policy decision she still has to make is the determination of 
the function xt- However, she now has knowledge of yx, the realized inventory level at the start 
of stage T, which she can incorporate into her minimax computations. We note that here we are 
faced with the modeling question of how to reconcile the use of y^'s realized value in performing 
one's minimax computations (i.e. selecting the worst-case demand in stage T), which depends on 
past realizations of demand, with the fact that in the static formulation, the policy maker supposed 
that the demand in each stage was independent. 
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11 



We take a conditional approach in line with the co nditioning framework used in p revious for- 
mulations of time consistent risk averse problems (cf.. iRuszczvnski and Shapiro! l2006l ). Similar to 
( 117p we write the following dynamic programming equations 

Vt{y)=m{Ti\ct{x-y)+ max EqJ*,(x, A) + /ol^t+i(a; - A)] | , (23) 

x>y (^ Qf6OTt J 

t = l,...,T, withyr+i(-) = 0. 

In analogy with (jl8|) . the dynamic programming equations (|23p naturally define a set of optimal 
policies through the relation 



Xt{y) G 2)t(y) := argmin <^ q(x - y) + max Eq^ [^>t{x, A) + pVt+i{x - A)] ^ * = 1, -, T, (24) 
x>y [_ Qtemt J 

with associated optimal value given by ^1(^1). We define the set of optimal policies for the dynamic 
formulation to be those policies vr = {xi, . . . ,xt) G H such that Xt{y) G 2)t(y) for all y G M,t = 
1, . . . ,T. We now briefly co mment on the meaning of the dynamic programming equations (|24p . 
and also refer the reader to (IShapirdl201ll . section 3.4) for a more detailed discussion. 

Note that for any given policy vr G H, the objective function X]i=i P*~^ ['^t(^t(?/t) ~ Vt) + 
^t{xt{yt),Dt)~\ for the static formulation is a deterministic function Z = Z{Di, ...,Dt) of 
the demand process. The value associated with this policy under the static formulation is 
maxQe9,;EQ[Z]. 

For the random variable Z = Z{Di, ...,Dt) and probability measure Q of (Di, ..., Dt) we have 



Eq[Z]=Eqp„ Eqp,J---Eqp [Z]] 



where for uniformity of the notation we assume that A is deterministic. Thus 
max Eg [Z] < max Eg m maxEgm ["•••maxEgm [Z]l . 
Since probability measures Q G 9Jt are of the form Q = Qi x ■ ■ ■ x Q^, we can write (f26|) as 

maxEg[Z] < max^Eg, [ mj^x^EgJ- • .^mj^x^Eg,[Z(A, ...,^^ 



(25) 



(26) 



(27) 



Then informally, the value of the policy vr under the static formulation coincides with the left- 
hand side of (|27p . while the value of the policy vr under the dynamic formulation coincides with 
the right-hand side of (j27p . Furthermore, the nested structure of the right-hand side of (|27p allows 
the dynamic formulation to be solved by dynamic programming, with optimal policies defined by 
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Observation 4 It follows from (j27p that the optimal value ^i(yi) of the dynam,ic form,ulation is 
always greater than or equal to the optimal value of the static problem (I22p . Moreover, in principle 
the inequality can he strict. This can he explained as follows. 

For the sake of simplicity let T = 2. Let us fix some policy vr G 11, and let Z,^ = Z^(Di, D2) denote 
the corresponding ohjective function. Clearly for any Q2 G 9712 the following inequality holds 

EQ^[Z^iD„D2)]< max EqJZ,(Z)i,Z?2)] (28) 



for all Di, and hence for any Qi € 9Jti and Q2 € 9^2, 



Eq,,qJZ.(A,D2)]<Eq, 



max Eq/[Z^{Di,D2)] 



(29) 



By taking maximum of hoth sides of (j29p with respect to Qi € 9Jti and Q2 G ^2 we ohtain (|27p . 
Suppose that the maximum in the right hand side of (|28p is attained at some distrihution Q2 = 
Q2{Di), which in general depends on Di. If Q2 does not depend on Di, then for Q2 = Q2 the 
equality holds in (j28p and hence the corresponding inequality (|27p holds as equality. If, on the other 
hand, Q2{Di) does depend on Di, then for any Q2 € ^2 the inequality ()28p is strict for some values 
of Di. Therefore the possihle dependence of Q2{Di) on Di is a reason why the inequality (|27p can 
he strict. We will elahorate on this further in the following sections. 

In particular, suppose that SJti = {Qi} is a singleton with Qi = ^60 + ^5i. Then the associated 
cost under the static formulation equals 

max Eq^^q^[Z,{D^,D2)] = ^ max f / Z,(0,r)dQ2(r) + [ Z,{l,T)dQ2{T)] , 
Q26OT2 Q2G2K2 \J J / 

while the associated cost under the dynamic formulation equals 

Eg, maxEQjZ.(A,i^2)] = ^ max / Z.(0,r)dQ2(r) + i max /z.(l,r)dQ2(r). 
Q2&-0h J V2e^'2j Q2&-'Jh J 

Thus the value of the dynamic formulation is strictly greater than that of the static formulation if 

mini max / ZTr(0,T)dQ2(T) + max / ZT,(l,T)dQ2(T) ] > 
TTsn \Q2eOT27 ^ -^ \ ^ Q2edn2j / -^ \ / j 

min max ( /'z,(0,r)dQ2(r) + /'z,(l,r)dQ2(r) ) . 
Tren Q26OT2 \j J J 

We note that it follows from the nature of the dynamic programming equations and our defi- 
nitions that in the dynamic formulation, under any optimal policy, the amount ordered in stage 
t is a function of yt- As we wish to compare the set of optimal policies of the static and dynamic 
formulations, it makes sense to consider only those static policies which share this property, which 
(as described previously) is the approach we take here. However, we note that loosening this restric- 
tion, and considering a more general family of policies for the static problem, could potentially 
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introduce qualitatively different types of optimal policies for the static formulation, and this may 
be an interesting direction for future research. 

Let us recall the following classical definition of a basestock policy. 

Definition 3.1 A policy vr = (xi(yi),X2(y2)) • • • ,XTiyT)), for either the static or dynamic formula- 
tion, is said to be a basestock policy if there exist constants x^ G M, t = l,...,T, such that Xt{y) = 
max{y,Xj} for all y £M. and t = 1, . . . ,T. 

Observation 5 It is well known that the classi cal formulatio n of the multistage inventory problem 
always has an optimal basestock policy (e.g., \Zipkin ^200(1 )). Furthermore, it follows from the 
convexity of the cost-to-go functions Vt, defined in \2^), that the distributionally robust dynamic 



formulation possesses an optimal basestock policy (cf.. 



Ahmed. Cakmak and Shapirc 



2001). 



We note that the question of whether or not there exists such an optimal basestock policy for 
the static formulation is considerably more challenging, and will be central to our discussion of 
time consistency in the next section. 

4. Time consistency 

As discussed in Section 13.21 there is no apriori guarantee that the static formulation is equiva- 
lent to the corresponding dynamic formulation in the distributionally robust setting. Disagreement 
between these two formulations is undesirable from a policy perspective, as it suggests that a pol- 
icy which was optimal when performing one's minimax computations before seeing any realized 
demand may no longer be optimal if one reperforms these computations at a later time. This gen- 
eral problem goes under the heading of time (in)consistency. Although first addressed within the 
economics community, the issue of time (in)consistency ha s recently start e d to r e ceive attention in 



the st o chastic and robust o p timizati o n com r nunities (e.g. 



Artzner et al. 



mm . iHuang et all (1201 ih . iRiedeJ (J200J), 
therein). 



Ruszczyhski 



fl2007l). Carpentier et al. 

(boiol), Ishapirol mm and references 



A well known quotation of 



Bellman 



fll957f ). coming from his pioneering work on dynamic pro- 
gramming, asserts that: 'An optimal policy has the property that whatever the initial state and 
initial decision are, the remaining decisions must constitute an optimal policy with regard to the 
state resulting from the first decision . " In a somewhat more precise form this principle has been 
formulated, e.g., in ICarpentier et al.l (J2012[ ) as: "The decision maker formulates an optimization 
problem at time to that yields a sequence of optimal decision rules for to o-nd for the following 
time steps ti, ...,tj^ =T. Then, at the next time step ti, he formulates a new problem starting at ti 
that yields a new sequence of optimal decision rules from time steps ti to T. Suppose the process 
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continues until time T is reached. The sequence of optimization problems is said to be dynamically 
consistent if the optimal strategies obtained when solving the original problem at time to remain 
optimal for all subsequent problems. " 

From a conceptual point of view this is quite natural - an optimal solution obtained by solv- 
ing the problem at the first stage should remain optimal from the point of view of later stages. 
The setting in which one re-optimizes at each stage coincides precisely with the dynamic for- 
mulation for the distrihutionally robust multistage news vendor problem, while the "problem at 
time to" coincides naturally with the static formulation. Intuitively, this motivates us to define 
an optimal policy vf = (xi, . . . ,Xt) of the static problem to be time consistent if vf is also opti- 
mal for the dynamic formulation. However, we will have to take some care here, to rule out 
the following situation. Note that by definition, vf is also an optimal policy for the dynamic for- 
mulation iff Xt{y) G 2)t(y) for all y G M. However, certain values of y may be irrelevant under 
the static formulation, e.g. values of y with zero probability, and a policy may take any action 
whatsoever for such values of y while remaining optimal for the static formulation. Of course, 
we do not wish to declare a problem time inconsistent for such trivial reasons, and thus wish 
to compare policies only for "relevant" values of y. We now make this precise. For a pol- 
icy vrGH, let S" := argmaXg^ajjEg {^f^, p'-i [^(^^(yO - y*) + **(^*(y*), A)] }• If H'^ = 0, we 
let E'" denote the set of all sequences of distributions {Q„,n > 1} s.t. Qn £ 9JT for all n, and 
limn-^^Eq^ I Ylt^i P*~^ [ct{xt{yt) -yt) +^t{xt{yt),Dt)]j equals the value of vr under the static 
formulation. For vr G H, Q G H'^, and t G 1, . . . ,T, let Y^"'^ denote the family of all measurable sets 
A G M s.t. Q{yt € A) = 1, i.e. the probability that yt ^ A under policy vr, if D[t] is distributed as Q, 
equals one. For vr G H s.t. S'^ = 0, and a sequence {Qn, n > 1} G S'^, let Y^^ " denote the family of 
all measurable sets A G M s.t. lim„_j.oo QniVt G A) = 1. We now combine the above ideas to formally 
define time consistency in our setting, and will use this definition for the remainder of the paper. 

Definition 4.1 Suppose that vf = (xi, . . . , Xt) is an optimal policy of the static problem. If 3^ ^ 0, 
we say that vf is time consistent if there exists Q G H'^ s.t. for all t G [!,?"]; there exists At G Y^^'^ 
s.t. Xt{y) G 2)t(y) for ally ^ At. If'E^ = 0, we say that vf is time consistent if there exists {<5„,n > 
1} G S* s.t. for all t G [1,T], there exists At G y^*'^*^"^ s.t. Xt{y) G 2)t(y) for all y G At. We say 
that the static problem is weakly time consistent if it possesses at least one time consistent optimal 
policy. We say that the static problem is strongly time consistent if its every optimal policy is time 
consistent. 

That is, weak time consistency amounts to the condition that the intersection of the sets of 
optimal policies of the static and dynamic formulations is non-empty. Strong time consistency is 
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equivalent to requiring that the entire set of optimal policies for the static formulation is contained 
in the set of optimal policies of the dynamic formulation. 

If the set 9Jt is a singleton, and hence we are back to the classical formulation, then it is well- 
known that stro ng time consis tency follows (we already mentioned this in Section [3TT]) . Interestingly, 



it follows from (IShapirdl201ll . section 4.2.2) that if one only has information about the support It 
and first moment /it sXj of demand at each stage, and It = [a*,/?*] is bounded for all t, then the 
corresponding distributionally robust multistage inventory problem is strongly time consistent. In 
that case the dynamic formulation reduces to the static formulation (in an appropriate sense), as 
it can be shown to follow from convexity that in all stages, the "worst-case" demand distribution 
is independent of previous demand. As we will see, the question of time consistency becomes much 
more involved when one is also given second moment information. 

4.1. Sufficient conditions for time consistency 

In this section, we provide simple sufficient conditions for the time consistency of Problem (|22|) . We 
begin by providing a different (but equivalent) formulation for Problem (|22p . in which all relevant 
instances of yt are rewritten in terms of the appropriate Xt functions, as this will clarify the precise 
structure of the relevant cost-to-go functions. As a notational convenience, let c^+i = 0, in which 
case we define 

^t(x(,dt) := {ct - pct+i)xt + bt[dt - Xt]+ + ht[xt - dt]+, t = l,...,T. (30) 

Let us define the problem 

^p'''%{xtiyt),D 

.4=1 

Then it follows from a straightforward substitution and calculation that 



minmaxEn 



't 



T-l 

Ci2/i + ^p*Ct+i/it. (31) 



Observation 6 Problem (|22|) and Problem (|31|) are equivalent, i.e. each policy vr G 11 has the same 
value under both formulations. 

We now derive a lower bound for any policy, which intuitively comes from allowing the policy 
maker to reselect her inventory at the start of each stage, at no cost. As it turns out, this bound 
is "realizable" when the set of basestock levels is monotone increasing. For x G M, let us define 

r]t{x):= maxEQ^[^t{x,Dt)], F^ := argmaxEQj^t(x, A)], (32) 

and let 

fit :=min?7t(a;) = min max KnA^tix.Dt)], 

Tt '■= argminr7t(a;) = argmin max EQj^t(x,Dt)]. 

a;GR x<EM Qt&^t 

For j > 1, and probability measures Qi, . . . ,Qj, denote (8>t^iQt '■=Qi x • • • x Q^-. Then we have the 
following. 
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Lemma 4.1 Suppose that the sets T'^,Tf are non-empty for all x &M., t = 1, ...,T. Let us fix any 
■K = {xi, . . . , xt) G n, and i > 0. Then for any given Qi G 93ti, . . . ,Qi G 9Jli, there exist Qi+i G 
2JTj+i, . . . , Qt G SOTt such that 



E, 



i^Qt [^t{xtiyt),Dt)] > m for allt>i + l. (34) 



T-l 



Furthermore, the optimal value of Problem ()22p is at Zeasi X]t=i P* ^^t ~ ^i^i + ^j^-^ /9*C(+i/ij. 

Proof Suppose i G {0,...,T} and Qi,...,Qi are fixed. We now prove that ()34p holds for all 
t > i + 1, and proceed by mduction. Our particular induction hypothesis will be that there exist 
Qt+i,--.,Qt+n such that 



E, 



ilt^Qti^ti^t^yt'^^^t)] >^t for antG[i + l,i + n]. (35) 



We first treat the base case n = 1. It follows from Jensen's inequality, and the independence 
structure of the measures in 9JT, that for any Qi+i ^ ^i+i, 

Taking Qi+i to be any element of T^_^_l"'^ (F^^ if i = 0) completes the proof for n = 1. 

Now, suppose the induction holds for some n. It again follows from Jensen's inequality, and the 
independence structure of the measures in 9Jt, that for any (5i_|_„_|_i G Tli^n+i, 

^(^i±^+'^Qt [^ i+n+l[Xi+n+l[yt+n+l) J Di+n+i) \ > Eq.^^^^ [^j+„+i (E^i+n^ JXj+n+l (l/j+n+l )] , L'i+n+l ) J . 

IE i + np l^i + n+liVi + n+l)] 

Taking Qi+n+i to be any element of ^^^^'^\ * completes the induction, and the proof, 

where the second part of the lemma follows by letting i = 0. □ 

We now show that the bound of Lemma 14.11 is "realizable" when the set of basestock levels is 
monotone increasing, and that in this case the associated basestock policy is optimal for both the 
static and dynamic formulations. In particular, in this setting, the associated basestock policy is 
time consistent, and thus the static problem is weakly time consistent. 

Proposition 4.1 Suppose there exists x* = {xl, ...,xtp) such that yi < xl, {x^, t = 1,...,T} is 
nondecreasing, and x^ G Tf for all t = 1, ...,T. Also suppose It C M^ for all t = 1, ...,T. Then the 
basestock policy vr for which Xt{y) = max{y, rc^ } for all y G M, is an optimal policy for both the static 
and dynamic formulations, and attains value X]t=iP*^^'?t ~ ^t^Vi + St=i P*^Ct+il^t- Consequently, 
this basestock policy is time consistent, and the static problem is weakly time consistent. 
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Proof Note that under these assumptions, if pohcy vr is implemented under the dynamic for- 
mulation, then w.p.l Xtint) = x^ for all t = 1, ...,T. It then follows from a straightforward induction 
that TT is an optimal policy for the dynamic formulation, and w.p.l, for all t = 2, ...,r, 

T 

^t(?/t)=^t-CtX*_i + CtA-l+ ^ P'^\fls + CsHs-l), 

s=t+l 

and 

Combining with Lemma 14.11 and Observation [5] completes the proof. D 

We now show that under additional assumptions, which intuitively correspond to requiring that 
there is a unique optimal basestock policy, and in this policy all basestock constants equal zero, 
the static problem becomes strongly time consistent. 

Theorem 3 Suppose that b[ := bt — Cf + pct+i > 0, h[ := ht + Ct — pct+i > 0, at,fit> 0, ^t = K+, t = 
1, . . . ,T, yi=0, and 

^>^, t = l,...,T. (36) 

Pt 'h 

Then the set of optimal policies for the static problem is exactly the set of policies 

n° := {vr = (xi , . . . , j;t) G n : xi (yi) = 0, Xt{z) = for all z <Qandt e[l,T]}, 
and the static problem is strongly time consistent. 

The result of the above theorem shows (under the specified conditions for the involved parame- 
ters) that if yi = and variance at each stage is sufficiently large, then the basestock policy which 
always orders up to exactly zero is the optimal policy for both the static and dynamic formulations 
(compare with condition (i) of Theorem [1]) . This can be interpreted as asserting that if the vari- 
ance at each stage is sufficiently high, then ordering up to any strictly positive amount only gives 
"nature" (which is picking the worst-case distribution) more "freedom" to select a distribution 
which yields higher costs. Although we do not formally prove that the conditions of Theorem [3] 
are tight for strong time consistency, our later examples show that if one deviates slightly from the 
conditions of Theorem [3l it becomes possible for the static problem to lose this property. 

Proof of TheoremlM Let 11°^* denote the set of optimal policies for the static problem. It follows 
from Theorem [TJi) and Proposition 14. II that 11° C n°^*, and every policy vr € 11° is time consistent. 
Thus to prove the theorem, it suffices to demonstrate that 11° = 11°^*, and we begin by showing 
that vf = (xi, . . . , xt) G n°P* implies Xi(yi) = 0. Indeed, it follows from Lemma 14.11 that vf G 11°^* 
implies 

maxEQ[^i(xi(2/i),i:»i)] =f}i = bipi. 
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That Xi(yi) must equal then fohows from Theorem [TJ 

We now show that vf G 11°^* imphes X2iz) = for ah z < 0. Suppose for contradiction that there 
exists z' <0 s.t. X2{z') / 0. It is easily verified that there exists Qi G 9Jli such that Qi{—z') > 0, 
and consequently for this choice of Qi, 2:2(2/2) is not a.s. equal to 0. We conclude from Proposition 
12.11 that there exists Q2 £ 9^2 such that 

^QlxQ2[^2{x2{y2),D2)] >?)2=62/i2- 

As we have already demonstrated that Xi{yi) = 0, and Qi G 9Jti, we conclude that 

Combining with Lemma |4. II then yields a contradiction. The proof that Xt{z) = for all z < and 
t > 3 follows from a nearly identical argument, and we omit the details. D 

4.2. Further study of time (in) consistency 

In this section, we show that the static problem is neither strongly nor weakly time consistent 
in general. Furthermore, our examples demonstrate that the question of weak and strong time 
consistency can be quite subtle in this setting. Throughout this section, we will let 11°^* denote 
the set of all optimal policies for the corresponding static problem, and 11^^ denote the set of all 
optimal policies for the corresponding dynamic problem. 

4.2.1. Example when the static problem is not weakly time consistent In this section, 
we explicitly provide an example for which the static problem is not weakly time consistent, showing 
that in general, the static and dynamic formulations need not have a common optimal policy. 
Furthermore, for this example, the static and dynamic formulations have different optimal values. 

Let us define yi = 10, p=l, 

Xi = [l,3], /ii = 2, cTi = l, ci=0, 5i = 2, /ii = 2, 

X2 = K+, /i2 = 8, f72 = 2, C2 = 0, 62 = 1, h2 = l. 

Let n^ denote the set of policies n = (xi,a;2) such that Xi(lO) = 10, 52(9) = 9, ^2(7) = 7, and 11^ 
denote the set of policies n = (xi,52) such that Xi(lO) = 10, X2(9) = 9, X2(7) = 8. Note that the set 
n^ specifies values of X2{y2) only for 2/2 = 9 and 2/2 = 7. As we will see in Lemma 14.21 other values 
of 2/2 are irrelevant for the static formulation regarding optimality. 

Theorem 4 11°^* = 11,, and the optimal value of the static problem is 18. On the other hand, 
U'^ C n^, and the optimal value of the dynamic problem is 17+ -^ > 18. Consequently, the static 
problem is not weakly time consistent, and the static and dynamic problems have different optimal 
values. 



Xin, Goldberg, and Shapiro: Distributionally robust multistage inventory models with moment constraints 

Article submitted to ; manuscript no. 19 

We first characterize the set of optimal policies for the static problem. 

Lemma 4.2 11°''* = 115, o,nd the static problem has optimal value 18. 

Proof It follows from Observation [1] that 9Jti consists of the single probability measure Qi such 
that Qi(l) = Qi(3) = |. Let Di denote a r.v. distributed as Qi. Note that for any policy vr = 
(a;i,X2) G n, one has that Xi(yi) = a;i(10) > 10. Consequently, Pr(a;i(yi) > Di) = 1, and |2;i(yi) — 
Di\ = Xi{yi) — Di w.p.l. It then follows from a straightforward calculation that the cost of any 
policy vr = (xi,a;2) G 11 under the static formulation equals 

2a;i(10)-4+ max Eg, [i(|a;2(xi(10) - l) - I?2| + |x2(xi(10) - 3) - Z?2|)] • (37) 

Let vf = (ail, X2) denote any optimal policy for the static problem, i.e. vf S 11°^*. Then it follows from 
(|37|) and a straightforward contradiction argument that 

xi(10) = 10. (38) 

Combining (l37|) and ([38|) . we conclude that 

(x2(9),X2(7))g argmin msi^ Eq^[^^{\x - D-2\ + \y - D^l)]. (39) 

Furthermore, it follows from Lemma |4. II and Theorem [1] that 

min maxEnJU\x-D2\ + \y-D2\)] > max EnJ|8 - D2I] = 2. (40) 

ix,y):x>9,y>7Q2&m2 ^2L2VI 1^ I^J - g^SOTa ^ '" "' 

Noting that 

i(|9-Z?2| + |7-I?2|) = l + max(-Z)2 + 7,0,L'2-9), 

it then follows from a straightforward calculation and Theorem [2] that 

max EQji(|9-i?2| + i7-i^2|)]=2. (41) 

Combining the above, we conclude that n,j C 11°^*. Also, it then follows from a straightforward 
calculation that the static problem has optimal value 18. 

We now prove that II^ = 11°^*. Indeed, suppose for contradiction that there exists some optimal 
policy Tc = (i;i,X2) ^ n^. In that case, it follows from ([38|) and (f39|) that |(x2(9) +£2(7)) > 8. 
However, it then follows from Jensen's inequality. Theorem [H and (|40p that 

maxEQji(|£2(9)-I?2| + |x2(7)-I?2|)] > max EQj|i(x2(9) + ^2(7)) - Z?2|] > 2. 

Combining with (|40p and (j4ip yields a contradiction, completing the proof D 
We now characterize the set of optimal policies for the dynamic problem. 
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Lemma 4.3 H^'* C H^, and the dynamic problem has optimal value 17+ ^. 

Proof Let vf = (xi,X2) denote any optimal policy for the dynamic problem, i.e. vf S HJ^''*. Then 
it again follows from a straightforward contradiction argument that 

xi(10) = 10. (42) 

It then follows from (IMD that 



and 



X2(9)Gargmin max Eq2[|2; — I?2|], 

x>9 Q26OT2 



3^2(7) G argmin max EQ2[|rE — Z)2|]- 
x>7 Q2eOT2 



The lemma then follows from Theorem [T] and a straightforward calculation D 
Combining Lemmas 14.21 and 14.31 completes the proof of Theorem [H 

4.2.2. Example when the static problem is weakly time consistent, but not strongly 
time consistent In this section, we explicitly provide an example showing that it is possible for 
the static problem to be weakly time consistent, but not strongly time consistent. In particular, 
the static and dynamic formulations have a common optimal basestock policy vr* , with associated 
basestock constants xl,x'^, satisfying the conditions of Proposition 14. H yet the static problem 
has other non-trivial optimal policies which are suboptimal for the dynamic formulation. The 
intuitive explanation is as follows. In the static formulation, one can leverage the randomness in the 
realization of Di to construct a policy vr' such that with positive probability x'2{y2) is slightly below 
x*2, and with the remaining probability is slightly above Xj. Since in the static formulation "nature" 
cannot observe the realized inventory in stage 2 before selecting a worst-case distribution, it turns 
out that such a policy incurs the same cost as vr' under the static formulation. Alternatively, this 
policy is suboptimal in the dynamic formulation, as the adversary can first see exactly how the 
inventory level deviated from vr*, and exploit this to achieve a strictly higher cost. We note that 
in this example, even though the static problem is not strongly time consistent, both formulations 
have the same optimal value, as dictated by Proposition 14.11 

Let us define yi = 0, p = 1, 

Xi = [l,3], /ii = 2, 0-1 = 1, ci=0, 61 = 1, /ii = l, 

X2 = R+, ^2 = 10, 0-2 = 1, C2 = 0, 62 = 1, /i2 = l- 
Then we prove the following. 
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Theorem 5 The static problem is weakly time consistent, but not strongly time consistent. 
We first prove that the static problem is weakly time consistent. 

Lemma 4.4 The static problem is weakly time consistent, and both the static and dynamic prob- 
lems have optimal value 2. 

Proof Note that 

^i{xi,di) = \xi -di\, ^/2{x2,d2) = \x2-d2\. 

It follows from Observation [1] that 9Jti consists of the single probability measure Qi such that 
Qi(l) = Qi(3) = i. It follows from Theorem [1] and a straightforward calculation that 

fi = [l,3] , fa = 10 , fi2 = l. 

Combining the above with Proposition 14. H we conclude that the basestock policy vr such that 
Xi{y) = max{3, y}, and X2{y) = max{10, y} for all y G M, is optimal for both the static and dynamic 
problems, which have common optimal value 2. □ 

We now prove that the static problem is not strongly time consistent. In particular, consider the 
policy tt' = {x[,X2) such that 

x[{y) = ma^{3,y}, and x'^iy) = \ '\^^^ . ^ ^~ .' (43) 

I max|lU.l,y|, otherwise. 

Lemma 4.5 The policy vr' G 11°^*, but vr' ^ H^^*. Consequently, the static problem is not strongly 
time consistent. 

Proof We first show that vr' G 11°^*. It follows from a straightforward calculation that the cost 
of vr' under the static formulation equals 

EqJ3-L>i|+0.1+ max Eq2 max {9.9 -Di, 0,1^1-10.1}. (44) 

It is easily verified that the conditions of Theorem [2] are met, and we may apply Theorem [2] to 
conclude that argmaxQ2gOT2 ^Q2 ™^^{9-9 ~ ^ijOj-^i ~ 10. l} is the probability measure Q2 such 
that (52(9) = |, (52(11) = i. It follows that the value of expression in dH]) equals 2, and we conclude 
that vr' G 11°^*, completing the proof. 

We now show that vr' ^ H^*'*. Suppose, for contradiction, that vr' G H^^*. It then follows from a 
straightforward calculation that 

9.9Gargmin max Eg^il^; — -D2I] (45) 

x>o Q2eOT2 

However, it follows from Theorem[T]that the right-hand side of (|45p is the singleton {10}, completing 

the proof D 

Combining Lemmas 14.41 and 14.51 completes the proof of Theorem [5j 
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4.2.3. Example when the static problem is strongly time consistent, but the two 
formulations have a different optimal value In this section, we explicitly provide an example 
showing that it is possible for the static problem to be strongly time consistent, yet for the two 
formulations to have different optimal values. We note that, although it is expected that there will 
be settings where the two formulations have different optimal values, it is somewhat surprising 
that this is possible even when the two formulations have the same set of optimal policies. 

Let us define yi = 0, p = l, 

Xi = [l,3], /ii = 2, cri = l, ci=0, 6i = 0, hi=0, 

X2 = M+, fi2 = W0, (72 = 5, C2 = 2, 62 = 1, h2 = l. 

Let n denote the set of policies vr = (xi,X2) such that Xi(0) = 102, X2(101) = 101, £2(99) = 99. 
Then we prove the following. 

Theorem 6 11°^* = 11, and the static problem is strongly time consistent. However, the optimal 
value of the static problem equals 5, while the optimal value of the dynamic problem equals \/26 > 5. 

We first characterize the set of optimal policies for the static problem. 

Lemma 4.6 11°''* = 11, and the static problem has optimal value 5. 

Proof It follows from Observation [1] that QJti consists of the single probability measure Qi 
such that Qi(l) = Qi(3) = |. In this case, the cost of any policy vr = (2;i,a;2) € 11 under the static 
formulation equals 



max Eo„ 
Q2GIM2 ^^ 



Kqi 



(46) 



2(x2(xi(0) - A) - (xi(0) - A) j + |a:2(xi(0) - A) - A 

We now prove that for any policy vf = (xi,X2) € 11°^*, one has that 

X2(xi(0)-l)=Si(0)-l and X2(xi(0) - 3) = Xi(0) - 3. (47) 

Indeed, note that w.p.l, it follows from the triangle inequality that 

2(x2(xi(0) - D,) - (xi(0) - I?i)) + |a;2(xi(0) - D,) - D,\ 
= 2(x2(xi(0) - D,) - (xi(0) - Z?i)) + |x2(xi(0) - D,) - (xi(0) - D,) + (xi(0) - D,) - D,\ 



> 2(^X2(xi(0) - D,) - (xi(0) -D,)j + \ (xi(0) -D,)-D2\- |x2 (xi(0) - D^) - (xi(0) - D,)\ 
= X2(xi(0)-i?i)-(xi(0)-A) + |xi(0)-i?i-i^2|. (48) 



Now, suppose for contradiction that ( 147p does not hold. It follows that 

Eq^ [x2 (xi(0) - Di) - (xi(0) - D,)] > 0, 
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and combining with ()18|) . we conclude that ()16|) is strictly greater than 



max Eo„ 



Eqi 



\x,{0)-D,-D,\ 



(49) 



Noting that (|49p is the cost incurred by some policy satisfying (|47p completes the proof. 
We now complete the proof of the lemma. It suffices from the above to prove that 

argmin max Eq^ [i(|xi - 1 - D2I + |xi - 3 - Z?2|)] ={102}. (50) 

It follows from a straightforward calculation that as long as Xi > 3, (rci — 100) (104 — Xi) < 25 and 
xi - 2 - ((xi - 2 - 100)2 + 25) ^ > 0, which holds for ah xi G [100, 104], the conditions of Theorem 
[2] are met. We may thus apply Theorem[2]to conclude that for all Xi G [100, 104], 

max Eg, [i (|xi - 1 - D2I + ki - 3 - D^l)] (51) 

has the unique optimal solution Q2 such that 

Q2 (2:1 - 2 - ((xi - 2 - 100)2 + 25) ^) = 25 ('25 + (xi - 2 - ((xi - 2 - 100)^ + 25) ^ - lOO' 

and 

Q2 (a;i - 2 + ((xi - 2 - 100)2 + 25) ^) = 1 - 25(^25 + (xi - 2 - ((xi - 2 - 100)^ + 25) ^ - lOO)^ 

It then follows from a straightforward calculation that for xi G [100, 104], (15ip has the value 

3(xi) := (x2 - 204xi + 10429) ^ . 

It is easily verified that 5 is a strictly convex function on [100, 104], g has its unique minimum on 
that interval at the point 102, and g{102) = 5. The desired result then follows from the fact that 
(I5ip is a convex function of Xi on M. □ 
We now prove that the static problem is strongly time consistent. 

Lemma 4.7 The static problem is strongly time consistent, and the optimal value of the dynamic 
problem equals v26- 

Proof First, we note that as in the static setting, any policy vf = (xi,X2) G H)^^* also satisfies 
(147|) . The proof is very similar to that used for the static case, and we omit the details. To prove 
the lemma, it thus suffices to prove that 

argminfi maxEQj|xi-l-D2|]+i maxEQj|xi-3-D2|]')={102}. (52) 

2:ieR+ \ Q2eOT2 Q2GOT2 y 
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It is easily verified that for all Xi G [100, 104], we may apply Theorem [J to conclude that 



max En, [1x1 -1-D^\] = ((x^ - lOl)^ + 25) 
ma^EQ^[\x,-3-D,\] = {{x,-W3r + 25) 



We conclude that for all Xi G [100, 104], 

i max Eq2 [Ixi - 1 - D^l] + \ max Eq^ \W - 3 - I^sI] (53) 

^ Q2eOT2 '- -' Q26OT2 '- -' 

equals 

g{x^) := \ (((xi - 101)2 + 25)^ + ((xi - 103)^ + 25)^) . (54) 

It is easily verified that g{x) is a strictly convex function of x on [100,104], g has its unique 
minimum on that interval at the point 102, and ^(102) = \/26- The desired result then follows from 
the fact that (|53p is a convex function of x^ on M. D 

Combining Lemmas 14.61 and 14.71 completes the proof of Theorem [6j 

5. Conclusion 

In this paper, we considered a minimax approach to managing an inventory under distributional 
uncertainty. In particular, we studied the associated multistage distrihutionally robust optimization 
problem, when only the mean, variance and distribution support are known for the demand at 
each stage. Our contributions were four-fold. First, we gave a novel definition for time consistency 
in this context. More precisely, we defined two formulations for the relevant optimization problem. 
In the static formulation, the policy-maker cannot recompute her policy after observing realized 
demand. In the dynamic formulation, she is allowed to reperform her minimax computations at 
each stage. If these two formulations have a common optimal policy, we defined the static problem 
to be weakly time consistent. If all optimal policies of the static problem are also optimal for the 
dynamic problem, we defined the static problem to be strongly time consistent. 

Next, we gave sufficient conditions for weak and strong time consistency. Intuitively, our sufficient 
condition for weak time consistency coincided with the existence of an optimal basestock policy 
in which the basestock constants are monotone increasing. Our sufficient condition for strong time 
consistency could be interpreted in two ways. On the one hand, strong time consistency holds if 
the unique optimal basestock policy for the dynamic formulation is to order-up to at each stage. 
Alternatively, we saw that this condition also has an interpretation in terms of requiring that the 
demand variances are sufficiently large relative to their respective appropriate means. 

Third, we gave a series of examples of two-stage problems exhibiting interesting and counterin- 
tuitive time (in) consistency properties, showing that the question of time consistency can be quite 
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subtle in this setting. In particular, we showed that: (i) the static problem could fail to be weakly 
time consistent, (ii) the static problem could be weakly but not strongly time consistent, and (iii) 
the static problem could be strongly time consistent even when the two formulations had different 
optimal values. Interestingly, this stands in contrast to the analogous setting in which only the 
mean and support of the demand distribution is kno wn at each stage, for which it is known that 



such time inconsistency cannot occur IShapird (J2012l ) 



Finally, as it was necessary for our investigations of time consistency, we extended Scarf's well- 
known solution to the single-stage distributionally robust newsvendor problem for convex, contin- 
uous piecewise affine function with exactly two pieces to a class of convex, continuous, piecewise 
affine functions with three pieces. We further note that this technique should generalize to any 
number of pieces, and such results may be of independent interest. 

Our work leaves many interesting directions for future research. The general question of time 
consistency remains poorly understood. Furthermore, our work has shown that this question can be 
quite subtle. For the particular model we consider here, it would be interesting to develop a better 
understanding of the tightness of our sufficient conditions. Our examples w.r.t. time (in)consistency 
demonstrated that surprising phenomena can occur here, and a more thorough investigation of 
exactly which types of phenomena can and cannot happen, and when, would be beneficial. It is 
also an intriguing question to understand how much the two formulations can differ in o ptima l 



value and po 



i cy, ev en when time inconsistency occurs, along the lines of iHuang et al.l (120111 ) . 



Agrawal et al.l fl2012f ). On a related note, it is largely open to develop a broader understanding of the 
optimal solution to the static problem, or even approximately optimal solutions, as well as related 
algorithms. We note that one potential avenue for understanding this question is to consider a third 
formulation, in which the adversary can select any demand distribution {Di, . . . ,Dt) whatsoever, 
so long as the marginal distribution Qt £ DJtt for all t, as this formulation always yields an optimal 
value even larger than that of the dynamic formulation. Of course, it is also an open challenge to 
understand the question of time consistency more broadly, and more generally to understand the 
relationship between different ways to model optimization under uncertainty. 
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6. Appendix 

6.1. Proof of Theorem [1] 

Proof of Theorem{l\ We first compute the value of ip{x) for all x G M, and proceed by a case 

analysis. First, suppose x < 0. In this case, E,q[^{x, D)] = cx + b{^ — x) for all Q G 93t, and thus 

'4j{x) = cx-\-b{^ — x). (55) 

Now, suppose X > 0. Then it is easily verified that 

,, X (h — b)(x — u) b + h „ „ „,, 

V^(x) = cx + ^ ^ ^ + __maxEQ[|x-Z?|]. (56) 

Hence to compute il>{x), it suffices to solve maxggOT Eg [|x — D\], and we proceed by a case analysis. 
Recall that ^{z) := {{z - ^if + a^Y for ah z G M. 

2 2 _ _ _ _ 

First, suppose x > ^ ^"^ . Let us define A = (Aq, Ai, A2) s.t. 

-X,:=^{xY\x)+fix)), X,:=-xr\x), X,:=^r\x), 

and let g{d) := Aq + Xid + X2(P for all d G ffi. Then it follows from a straightforward calculation that 

g{d) and |x — d\ are tangent at di := x — f(x) and d2.= x + f(x), and consequently g{d) '>\x — d\ for 

- 2 1 2 - 

all d G M_|.. Hence A is feasible for the dual Problem (110|) . Also, as x > ^ ^ implies di > 0, it is 

easily verified that the probability measure Q such that 



Q{di) = a^ (a'^ + [x - fix) - fiYj , Q{d2) = 1 - a^ (a^ + {x -f{x) - fif 

is feasible for the primal Problem ([7]). It follows from Proposition 12.21 that Q is an optimal primal 
solution. Combining the above and simplifying the relevant algebra, we conclude that in this case 

,, , , , - b + h^, , b—h — 2c, , , „, 

W{x) = V'i(x) := cfi + -——fix) (x - /i). (57) 
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Alternatively, suppose x £ [0, ^^-y^-). Let us define A = (Aq, Ai, A2) s.t. 

and let g{d) := Xq + Xid + X2(P for all d G M. Then it follows from a straightforward calculation 
that g{d) and \x — d\ are tangent at di := fi~^{fi'^ + cr^), and intersect at d2 := 0, with ^'(0) > — 1. It 
follows that g{d) >\x — d\ for all d G M_,_. Hence A is feasible for the dual Problem (II Oh . Also, it is 
easily verified that the probability measure Q such that 

is feasible for the primal Problem ([7]). It follows from Proposition 12.21 that Q is an optimal primal 
solution. Combining the above and simplifying the relevant algebra, we conclude that in this case 

^(x) = M-) ■■= ^±±^K^^^^^, + b,. (58) 

We now use the above to complete the proof of the theorem. Note that since by assumption 6 > c, it 
follows from (f55l) that argmin^gjj^(x) C M_|_. Recall that k = ''~^~^^ . Furthermore, our assumptions, 
i.e. b> c,h + c> 0, imply that |k| < 1. Let x := A* + i^cr(l — k^)~2. It follows from a straightforward 
calculation that ipi is a strictly convex function on M, and ^i(x) = 0, i.e. ipi is strictly decreasing 
on (— oo,;\;), and strictly increasing on (x,oo). Furthermore, it follows from a similar calculation 
that 

is the same sign as x- (5"j 



^"^ h + c 2fi 

We now proceed by a case analysis. First, suppose ^ > j^. In this case, "02 is a linear function 
with strictly positive slope, and thus argmin ^.2+^2-0(2^) ={0}- Furthermore, it follows from 

^^P' Til ] 
2 I 2 _ _ _ 2 I 2 

( 159p that X < ^ 2 ' which implies that -01 is strictly increasing on [ ^ , "^ ;Oo). It follows from 

2 2 

the continuity of "0 that argmin ^2+^2 'ip{x) = j a* +g |^ Combining the above, we conclude that 

^- 271 '^ 

argmin^guV'(2;) = {0}. 

2 ,_ 
Next, suppose ^ < j—^. In this case, ip2 is a linear function with strictly negative slope, and thus 

2 2 ^^^_ 2 2 

argmin ^2+^2 ^(2^) = { ^2'^ I ■ Furthermore, it follows from (|59p that x > ^^~2^^' which implies 
that argmin ^2+^2 '0(x) = {x}- Combining the above, we conclude that argmin^gjj-(/;(x) = {x}- 



x>i 



2 



Finally, suppose that ^ = j—^. In this case, 1^2 is a constant function, and thus 



/^^ h-i-c 



2 2 2 2 

argmin ^2+^2 'i/'(x) = [0, '^ ^"^ ]. Furthermore, it follows from (l59|) that x = ^ 2^°^ ' which implies 
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2 2 

that argmin ^2+^2 ^(x) = { ^ ^"^ } . Combining the above, we conclude that argmin^gjj'0(2^) 



x>l 



[0,^ 



~w 



2fi 



2fi J' 



Combining all of the above with another straightforward calculation completes the proof of the 
theorem. D 



6.2. Proof of Proposition 12.11 

Proof of Proposition \2.1\ We know that 5o is an optimal solution to Problem ([9]) with value b^. 

2 2 2 

We now prove that it is the unique optimal solution. Let S := ./ ^ , r := ii— tH-. Let Q^ be the 
probability measure such that 

Qm=s, q;{t) = i-s. 

Recall that b — c> 0, and {h + c)a^ > {b — c)ii'^, which we denote by assumption Al. Note that the 
value of any feasible solution Qi to Problem ([9]) is at least Eq^xq* ^(-Di)-D2) , which itself equals 
the sum of cfi and 



6{{b-c)[0-D,]+ + {h + c)[D,-0]+) + {l-S){{b-c)[T-D,]+ + {h + c)[D,-T]+)\l{D,>0) 

(60) 
S{{b-c)[0-D,]+ + {h + c)[D,-0]+) + {l-6){{b-c)[T-D,]+ + {h + c)[D,-T]+)y{D,<0) 

(61) 



+ EQM6{{b-c)[0-D,]+ + {h + c)[D,-0]+) + {l-5){{b-c)[T-D,]+ + {h + c)[D,-T]+)\l{D, = 0) 

(62) 
Note that if P{Di > 0) > 0, then ([60]) is at least 



E 



""' {h + c)D, + —^{b-c){^^^^^-D,)\D,>0 P{D,>0) 



IJ? + (j2 



IJi' + a^ 



fJ' 



> E 



^' {b-c)D, + -i^{b-c){i^^±^-D,)\D,>0 



(6-c)^P(L'i>0). 



^ 



P(Di > 0) by Al 



(63) 



Similarly, if P{Di < 0) > 0, then (HH) is at least 



E 



//^ + a'- 



■{b-c)D^ + 



'^ {b-c)(^^±^-D,)\D,<, 



fi^ + a 



fJ' 



PiD,<0) 



E 



ib-c){n-Di)\D,<0 P{D,<0) > {b-c)fiP{D,<0). 



(64) 



Similarly, if P{Di = 0) > 0, then ([62]) equals (6 - c)^iP{Di = 0). Combining with dMD, dMD, and 
the fact that 5q attains value b^ completes the proof. D 
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6.3. Proof of Theorem d 

1 
Proof of TheoremlM Recall that r] := i(ci + C2), and f(z) := {{z - iif + a"^) ^ for all z G M. Also, 

letting hi{d) := —d + Ci, /12(d) := d— C2 for all d G M, we have that ^{x,d) = max{/ii((i),0, /12(d)} 

for all d G M. Let Q be the probability measure described in (I12p . and A = (Ao, Ai, A2) the vector 

described in ([l3]). Let g{d) := Aq + Aid + A2d^. We now prove that g{d) > ^{x,d) for all d G M. It 

follows from a straightforward calculation that g{d) is tangent to /ii(d) at di := r/ — f{rj), and 5(d) 

is tangent to /12(d) at d2 :=r] + f(^)- Thus g'(d) > max (/ii(d), /12(d)) for all d G M, and to prove the 

desired claim it suffices to demonstrate that g{d) > for all d > 0. It is easily verified that for all 

dGR, 

5(d) = ir'(^)(d-r/)^ + i(f(7?)+Ci-C2). (65) 

Recall that 

- (2/i - 3ci + C2) (3c2 - Ci - 2/i) < a^ , 

which we denote by assumption A2. It follows from another straightforward calculation that 
assumption A2 is equivalent to requiring that |(f(?7) + Ci — C2) > 0. Combining with (j65p . we con- 
clude that A2 implies g{d) > for all d G M, completing the proof that g{d) > ^(x, d) for all d G M. 
Hence A is feasible for the dual Problem (IIOD . Also, it is easily verified that Q is feasible for the 
primal Problem ([7]). It follows from Proposition 12.21 that Q is an optimal primal solution, and A 
is an optimal dual solution. That these optimal solutions are unique then follows from the second 
part of Proposition 12.21 and a straightforward contradiction argument. Combining the above and 
simplifying the relevant algebra completes the proof. □ 



